Comparison of two validated evidence‐based medicine assessments: Do they correlate?

Abstract Evidence‐based medicine (EBM) has been defined as a process involving five actions: asking, acquiring, appraising, applying, and assessing. Several attempts have been made to create and validate tools to assess EBM aptitude. The newest testing instrument, the ACE tool, which is a 15‐question true/false exam, has not been directly compared to the more established Fresno test, which is composed of 12 in‐depth short‐answer questions. Although both were designed to test Steps 1–4 of the five‐step EBM process, it is unclear whether they examine the same things or whether one is superior. To our knowledge there is not a widely used standard for EBM assessment despite the broad requirements for inclusion of EBM in both undergraduate and graduate medical education. Hypotheses It was hypothesized that these instruments do not correlate between one another, based on inherent differences between them, including assessment format, grading method, and scoring range. The authors sought to examine whether a correlation between the results of these two instruments exists in a population of U.S. medical students. Methods A retrospective cohort study of 158 fourth‐year U.S. medical students in academic year 2018–2019 was conducted. All students were exposed to a focused EBM curriculum, consisting of three guided discussions of separate journal articles clinically relevant to the practice of emergency medicine. Outcomes measured included scores on both the ACE tool and Fresno test using descriptive statistics. Spearman's rho was used to determine the correlation between the ACE and Fresno scores for each student among the entire group. A subgroup analysis was performed to assess for correlations at more extreme data points. Results The median scores on the ACE tool and Fresno test were 66.7% and 62.7%. There was no statistically significant correlation between the results of these two assessments (Spearman's rho 0.023, p = 0.774) in our population. The scores from the subgroup of advanced performers on the Fresno test showed a weak statistically significant positive correlation (p = 0.045) to advanced scores on the ACE tool. No other subgroups showed statistically significant correlation. Conclusions In our population of U.S. medical students, the results of two known EBM assessment instruments do not correlate with one another. The assessments may differ in what categories of learning they measure or in generalizability or perhaps in what depth of understanding they test overall. Further study is needed to determine what each instrument is measuring and whether there is demonstrable variation across populations.


PURP OS E
Evidence-based medicine (EBM) is "the integration of best research evidence with clinical expertise and unique patient values and circumstances." 1 The practice of EBM usually requires the following five steps: (1) translating the uncertainties into answerable questions (asking), (2) searching for and retrieving evidence to address the questions (acquiring), (3) critically appraising the evidence for validity and clinical importance (appraising), (4) applying the appraised evidence to inform the clinical decisions (applying or integrating), and (5) evaluating the performance in the previous four steps (assessing). 2 It is understood that the final step is in effect a measure of improved outcomes based on actual clinical application and is therefore generally not assessed as part of the mission of EBM educational initiatives.
In undergraduate and graduate medical education, education in EBM is required. The Accreditation Council of Graduate Medical Education (ACGME) Common Program Requirements lists education in the appraisal of medical evidence as a core didactic activity and as a necessary skill in demonstrating practice-based learning and improvement. 3 The American Association of Medical Colleges (AAMC) lists the ability to appraise and apply evidence as a way of demonstrating competency in Core Entrustable Professional Activities (EPA) #7: the ability to ask clinical questions to advance patient care. 4 However, as with much of the skill model of medicine, a clear and agreed-upon representation of all the skills and behaviors that comprise EBM has proved elusive. The 2011 Sicily statement on classification and development of evidence-based practice learning assessment tools put forward the CREATE framework, a rubric for EBM tools. 5 That group, among others, acknowledge that rarely are trainees taught or assessed on achieved EBM competency on all steps in the EBM process. [5][6][7] Attempts to demonstrate internal validity of several tools' scores have been made, but few have reported on practical differences between tools or compared results in the same population. 8,9 One of these tests, the Fresno test, was developed in 2003 with the goal of evaluating proficiency of EBM through the use of open-ended questions to show higher order thinking. 8 The goal was also to assess the learner in multiple areas of proficiency aside from just self-reported improvement or critical appraisal alone. 8 In this test, learners are presented with a clinical scenario and answer a series of twelve open-ended free-response questions. 8 The best possible score is 212 points. Validation of the test was determined with administration among family medicine residents and self-reported EBM experts to identify appropriate cutoff scores for the "novice in EBM" versus the expert taking the assessment. 8 When comparing the initial dataset from which the Fresno test was developed to a dataset used for validation of the test, there was good inter-rater reliability. 8 The Fresno test assesses three of the five EBM steps set forth by the Sicily statement described above (ask, acquire, and appraise). 5,8 It is graded with a detailed rubric with each question broken down into multiple parts. Lists are provided of content that the tester must include to achieve different levels of scoring (absent, limited, strong, excellent) and points are assigned based on these varying levels. This rubric limits what is left to interpretation by the grader, but some level of subjectively inherently exists. 8 The Berlin questionnaire is another EBM assessment tool that, like the Fresno test, was developed more than 15 years ago in 2002 and only assesses three of the five EBM steps. 9 It differs from the Fresno test in that it is multiple choice and designed to follow a brief 3-day EBM course. 9 Reports of the Fresno test and Berlin questionnaire as assessments in undergraduate medical education number less than a dozen. [10][11][12][13][14][15][16][17] In 2011, West et al. 12 applied both the Berlin and the Fresno assessments together, but an in-depth comparison between the two was not reported. Lai et al. 14  A cross-sectional study was performed among 342 medical students in Australia with varying levels of training in EBM. 15 The students were divided into cohorts based on this level of training.
The variance and means of scores on the ACE tool were analyzed and the test was determined by authors to be valid and reliable. 15 significant positive correlation (p = 0.045) to advanced scores on the ACE tool. No other subgroups showed statistically significant correlation. The authors argue that the strength of this new assessment tool is the ability to evaluate an additional step of the EBM process (application of results). 15 In the ACE tool, learners are presented with a clinical scenario and answer 15 yes-or-no questions. 15 The best possible score is 15.

Conclusions
Buljan et al. 18 subsequently used all three of these instruments, the Fresno test, Berlin questionnaire, and ACE tool, in a population of third-year medical students who took a 1-week course, comparing that cohort's scores to those of one who had not received the intervention. The investigators primarily assessed the postintervention change in scores within each of the three assessments, without focusing on whether the different tools' results correlated with one another.
The Berlin questionnaire was not included in this study because the Fresno and the ACE were more readily available, and adding a third assessment was felt to increase the burden on students while providing no clear benefit to them. To obtain a copy of the Berlin questionnaire also often requires direct request from the authors who developed the tool making it an unfeasible option for many. Additionally, as the ACE tool is extremely brief and easy to grade, while the Fresno test takes significantly more time for learners to take and requires significant time from knowledgeable faculty to grade, the educators felt it would be important for efficiency if equivalence of the two instruments in this population was demonstrated. Thus, we sought to evaluate whether scores on two of the assessments with prior work demonstrating internal validity, the ACE and Fresno, correlated with one another in a population of U.S. medical students in their mandatory emergency medicine clerkship.

Study design
This study was approved by our institutional review board (Pro2018001988) by noncommittee review for retrospective educational research. It is a retrospective cohort study examining whether a correlation exists between two EBM assessment tools on which prior validation studies have been performed, the ACE tool and Fresno test, in fourth-year medical students exposed to an EBM curriculum. Both assessments were given to each student, and scores were analyzed individually. Three authors of this paper Each conversation was designed to cover aspects of all four assessable steps of EBM.
The articles [19][20][21] pertained to common emergency medicine topics: endovascular therapy for treatment of ischemic stroke, utilization of the HEART score for the management of chest pain, and the incidence of contrast-induced nephropathy. They are, respectively, a systematic review and meta-analysis, a randomized controlled trial, and a retrospective study. The same articles were used for each group of medical students. Over the course of the three sessions, the students are taught the first four steps of EBM.
Students were provided critical appraisal worksheets from The Center for Evidence-Based Medicine and the Critical Appraisal Skills Program to complete prior to the journal club discussion. [22][23][24][25] The authors served as facilitators during these sessions.
They were aided by a facilitator guide created for each article by one of the study authors (AS) as a means of standardizing teaching across sessions (Appendix S1A-C). The discussion questions on these facilitated guides were developed to address the four assessable stems of EBM as described above. There was no formal training given to the facilitators beforehand, and there was no other EBM teaching during the 4-week clerkship. The authors were not blinded to study outcomes.

Participants
All 158 students enrolled in the required emergency medicine clerk-

Assessment administration
At the end of each clerkship block, students completed both the ACE and the Fresno assessments, which were administered together online via Qualtrics (QualtricsXM) as a single examination.
The students were given a total of 75 min to complete the exam, having been told that their score would represent 5% of their total clerkship grade.

Grading process
All assessments were divided and graded by one of two emergency medicine faculty, AK or AS. The ACE assessments were graded following the answer key provided by the original study. The Fresno assessments were graded according to the answer rubric provided by the original study, which includes a point-by-point breakdown of acceptable answers for each question. The time it took to grade each EBM exam type was not measured.

Statistical analysis
All the data available from the cohort of this study were used; therefore, no sample size calculation was done. The volume and pace of grading of the Fresno test made calculation of inter-rater reliability impossible with the resources available. All data were summarized and reported using descriptive statistics such as medians, means, and ranges. Data were tested for normality using the Shapiro-Wilk test.
Spearman's rho was used to determine the correlation between the ACE and the Fresno scores for each student among the entire group.
A subgroup analysis of both high-and low-scoring students was also done to assess for correlations at more extreme data points. (69.3%), respectively. 15 For the Fresno test, test-takers categorized as novice represented a cohort of resident physicians who scored a mean of 95.6 (45%) while those considered advanced test-takers were attending physicians, who scored a mean of 147.5 (69.6%). 18 Since the difference between novice and advanced on the ACE tool was fewer than two points, and scores were reported as whole numbers, it was decided to assign a score of 8 or lower as the novice threshold and 11 or higher as the advanced threshold to create more separation between categories for the purposes of this analysis. As

RE SULTS
A total of 158 of 158 eligible fourth-year medical students were included in our analysis. Scores on the ACE tool did not have a normal distribution in our population. The distribution of scores on the Fresno test was normal. On the ACE tool, the median score in this population was 10 out of a possible score of 15 (66.7%). The median score on the Fresno test was 133 out of a possible 212 (62.7%). There was no statistically significant correlation between the ACE and Fresno scores in the entire cohort of students (Spearman's rho 0.023, p = 0.774; Figure 1). We also compare the average percent score of students in our study cohort who answered each question correctly to the original ACE and Fresno validation study cohorts ( Table 2). Notably there was wide variability in scores per question.

DISCUSS ION
Scores on two validated EBM assessment tools do not correlate with one another in our population of senior medical students. The many structural differences between the tools could explain this finding: the two assessments are scored on different scales, with their own scoring ranges due to their inherently different testing and grading formats. In addition, the cohorts in the ACE and Fresno validation studies were somewhat different from one another (trainees in the Australian medical system versus family practice residents in the western United States, respectively) and are both different from our cohort of fourthyear medical students in the northeastern United States.
Given that the ACE tool is true/false and quite brief, while the Fresno test requires short answers created de novo by each learner, there may be important psychometric reasons for the discrepancy we found, which might be more demonstrated at lower skill levels.
This idea is weakly supported by the statistically significant correlation we found between scoring highly on the Fresno test and on the ACE tool. The Fresno test also requires some interpretation on the part of the graders, which could lead to measuring error in one but not the other. It is also possible that our population's prior education or our curriculum favors one assessment disproportionately. While the original Fresno study reported the average score for each part, our study collected the average score for the entire question only.

TA B L E 2 (Continues)
and vice versa, although more work would need to be done to exthe majority of the manuscript. Catherine Yu assisted with both the administration and the actual teaching of the EBM curriculum and assisted with statistical analysis, figure creation, and manuscript writing and editing. Ariel Sena assisted with teaching and grading for the EBM curriculum, created the facilitator guides upon which the curriculum is based, graded assessments and assisted with editing and analysis of the data collected, and wrote the original proposal and IRB approval. Naila Ghafoor assisted in the final stages of editing and formatting the manuscript as well as working to resolve some statistical analysis and focus questions. Shannon Moffett was the director of the course in which this curriculum was implemented; assured approval at the school and department level of the curriculum; ensured compliance with IRB requirements for research with students; and assisted in the conception, administration, logistics, data collection, and analysis at all points in the initiation, development, and completion of this project.

CO N FLI C T O F I NTE R E S T
The authors declare no potential conflict of interest.